1 Overview

1.1 Learning Objectives

• Understand the basic steps of single cell RNA-Sequencing analysis workflows • Develop a baseline awareness of cellular heterogeneity both between and within cell ‘types’. • Learn to identify and examine cell state transitions via pseudotime analysis • Understanding the application of dimensionality reduction to visualization and high-dimensional sequencing data analysis.

1.2 Data

1.2.1 Description

1.2.2 Data

The dataset we are using is 10x 10k neurons from an E18 mouse (This is a large dataset ~25Gb).

Cells for this sample are from a combined cortex, hippocampus and sub ventricular zone of an E18 mouse. - 11,869 cells detected - Sequenced on Illumina NovaSeq with approximately 30,000 reads per cell - 28bp read1 (16bp Chromium barcode and 12bp UMI), 91bp read2 (transcript), and 8bp I7 sample barcode

10x Genomics sequencing workflow overview

Experimental questions: - What types of cells do we expect to find? - What/how many cellular ‘states’ do we observe at the transcriptional level? - How well defined are different cell types * What do ‘transitioning’ cell types/states look like? - Can we identify differentially expressed and/or marker genes between cell types? - What genes change expression over the course of neuronal development?

2 Library imports

2.1 Install packages

require('BiocManager')
## Loading required package: BiocManager
## Bioconductor version '3.12' is out-of-date; the current release version '3.14'
##   is available with R version '4.1'; see https://bioconductor.org/install
packages <- c("Matrix","monocle3","RcppML","biomaRt","tidyverse","celldex","SingleR")

BiocManager::install(setdiff(packages, rownames(installed.packages())))
## 'getOption("repos")' replaces Bioconductor standard repositories, see
## '?repositories' for details
## 
## replacement repositories:
##     CRAN: https://cloud.r-project.org
## Bioconductor version 3.12 (BiocManager 1.30.16), R 4.0.2 (2020-06-22)
## Installation paths not writeable, unable to update packages
##   path: /usr/lib/R/library
##   packages:
##     boot, class, cluster, codetools, foreign, KernSmooth, lattice, MASS,
##     Matrix, mgcv, nlme, nnet, rpart, spatial, survival
## Old packages: 'colorspace', 'conquer', 'densityClust', 'doParallel',
##   'evaluate', 'future', 'quantreg', 'RCurl', 'recipes', 'RSQLite',
##   'scattermore', 'seriation', 'tidyselect', 'tinytex', 'VGAM', 'xgboost',
##   'yaml'

2.2 Load packages

library(Matrix)
library(monocle3)
## Loading required package: Biobase
## Warning: package 'Biobase' was built under R version 4.0.3
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: SingleCellExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:Biobase':
## 
##     anyMissing, rowMedians
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## The following object is masked from 'package:Biobase':
## 
##     rowMedians
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:Matrix':
## 
##     expand
## The following object is masked from 'package:base':
## 
##     expand.grid
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## 
## Attaching package: 'monocle3'
## The following objects are masked from 'package:Biobase':
## 
##     exprs, fData, fData<-, pData, pData<-
library(RcppML) # for nmf
library(biomaRt)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::collapse()   masks IRanges::collapse()
## x dplyr::combine()    masks Biobase::combine(), BiocGenerics::combine()
## x dplyr::count()      masks matrixStats::count()
## x dplyr::desc()       masks IRanges::desc()
## x tidyr::expand()     masks S4Vectors::expand(), Matrix::expand()
## x dplyr::filter()     masks stats::filter()
## x dplyr::first()      masks S4Vectors::first()
## x dplyr::lag()        masks stats::lag()
## x tidyr::pack()       masks Matrix::pack()
## x ggplot2::Position() masks BiocGenerics::Position(), base::Position()
## x purrr::reduce()     masks GenomicRanges::reduce(), IRanges::reduce()
## x dplyr::rename()     masks S4Vectors::rename()
## x dplyr::select()     masks biomaRt::select()
## x dplyr::slice()      masks IRanges::slice()
## x tidyr::unpack()     masks Matrix::unpack()
library(celldex)
library(SingleR)
## 
## Attaching package: 'SingleR'
## The following objects are masked from 'package:celldex':
## 
##     BlueprintEncodeData, DatabaseImmuneCellExpressionData,
##     HumanPrimaryCellAtlasData, ImmGenData, MonacoImmuneData,
##     MouseRNAseqData, NovershternHematopoieticData

3 Importing and cleaning raw preprocessed data

Since the preprocessing is a time-consuming step, I’ve already performed this operation for you (you can see how this is done by looking at the scripts/10x_10k_preprocessing.sh bash script for reference). The important output of this preprocessing operation is a set of three matrices that contain information about the number of mRNAs (specifically Unique Molecular Identifiers (UMIs)) contained within each droplet of the 10x emulsion.

The first is the geneMetadata which describes any associated information for each gene within the reference transcriptome:

gene_id gene_short_name chromosome
gene1 ENSMUSG00000005583 Mef2c chr13
gene2 ENSMUSG00000097063 Pantr2 chr1
gene3 ENSMUSG00000045515 Pou3f3 chr1

…

The second is the cellMetadata which describes any associated information for each cell/droplet from our single cell sample:

barcode celltype scaleFactor
barcode1 AGCTATGCGATAGCTAC
barcode2 GCTACGCGATCGATCGA
barcode3 ATCATACTGCGATACGC

…

The third is the counts matrix which is a geneXcell matrix where the values represent the raw counts for each gene in each cell:

cell1 cell2 cell3
gene1 0 2 0
gene2 15 7 3
gene3 1 0 2

…

Since this pre-processing was performed earlier, I’ve made these three matrices available for direct download.

geneMetadata<-as.data.frame(read_csv(url("https://dl.dropboxusercontent.com/s/5wqxsxdck9d5unh/cells_x_genes.genes.txt?dl=0"),col_names=c("gene_id")))
## Rows: 55421 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): gene_id
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
cellMetadata<-as.data.frame(read_csv(url("https://dl.dropboxusercontent.com/s/icm1qf1z6ayvz89/cells_x_genes.barcodes.txt?dl=0"),col_names=c("barcode")))
## Rows: 11869 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): barcode
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
counts<-Matrix::readMM(url("https://dl.dropboxusercontent.com/s/hd5h7ihcrte0mlv/cells_x_genes.mtx?dl=0"))
counts<-t(counts) # matrix is provided in the wrong orientation so need to transpose
head(geneMetadata)
##                gene_id
## 1 ENSMUSG00000102693.1
## 2 ENSMUSG00000064842.1
## 3 ENSMUSG00000051951.5
## 4 ENSMUSG00000102851.1
## 5 ENSMUSG00000103377.1
## 6 ENSMUSG00000104017.1
head(cellMetadata)
##            barcode
## 1 AAACCCAAGCAACTCT
## 2 AAACCCACACGCGGTT
## 3 AAACCCACAGCATACT
## 4 AAACCCACATACCATG
## 5 AAACCCAGTCGCACAC
## 6 AAACCCAGTGCACATT
head(counts)[1:6,1:6]
## 6 x 6 sparse Matrix of class "dgTMatrix"
##                 
## [1,] . . . . . .
## [2,] . . . . . .
## [3,] . . . . . .
## [4,] . . . . . .
## [5,] . . . . . .
## [6,] . . . 1 . .
dim(geneMetadata)
## [1] 55421     1
dim(cellMetadata)
## [1] 11869     1
dim(counts)
## [1] 55421 11869

Most barcodes (cells) only have 0 or 1 UMIs detected.

tot_count <- Matrix::colSums(counts)
summary(tot_count)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      10    3607    6373    7317    9905   52611

4 Monocle3 secondary analysis

Now that we have collected the count matrix and accessory matrices, we will need to organize these into a single cell analysis framework for downstream/secondary analysis. Here we will be using the monocle3 framework.

4.1 Create Monocle cell_data_set object from spliced matrix

The cell_data_set class defines how the single cell data are stored, indexed, manipulated, and sliced. The three components that we need to create a cell_data_set instance are 1) the sparse count matrix, 2) gene-level annotation, and 3) cell-level annotation. We don’t have much for #’s 2 or 3 at this point other than ids, but it’s enough to get started.

rownames(cellMetadata)<-cellMetadata$barcode
rownames(geneMetadata)<-geneMetadata$gene_id
colnames(counts)<-rownames(cellMetadata)
rownames(counts)<-rownames(geneMetadata)
dat <- monocle3::new_cell_data_set(counts,
                         cell_metadata = cellMetadata,
                         gene_metadata = geneMetadata)
## Warning in monocle3::new_cell_data_set(counts, cell_metadata = cellMetadata, :
## Warning: gene_metadata must contain a column verbatim named 'gene_short_name'
## for certain functions.

We have now created our base monocle3 cell_data_set object called dat. Lets peek around to see what’s inside:

dat
## class: cell_data_set 
## dim: 55421 11869 
## metadata(1): cds_version
## assays(1): counts
## rownames(55421): ENSMUSG00000102693.1 ENSMUSG00000064842.1 ...
##   ENSMUSG00000096730.7 ENSMUSG00000095742.1
## rowData names(1): gene_id
## colnames(11869): AAACCCAAGCAACTCT AAACCCACACGCGGTT ... TTTGTTGGTTCAACGT
##   TTTGTTGTCCGTTTCG
## colData names(2): barcode Size_Factor
## reducedDimNames(0):
## altExpNames(0):

This is a summary report of dat.

To access the expression matrix we use the assay() method:

assay(dat,'counts')[1:6,1:6]
## 6 x 6 sparse Matrix of class "dgCMatrix"
##                      AAACCCAAGCAACTCT AAACCCACACGCGGTT AAACCCACAGCATACT
## ENSMUSG00000102693.1                .                .                .
## ENSMUSG00000064842.1                .                .                .
## ENSMUSG00000051951.5                .                .                .
## ENSMUSG00000102851.1                .                .                .
## ENSMUSG00000103377.1                .                .                .
## ENSMUSG00000104017.1                .                .                .
##                      AAACCCACATACCATG AAACCCAGTCGCACAC AAACCCAGTGCACATT
## ENSMUSG00000102693.1                .                .                .
## ENSMUSG00000064842.1                .                .                .
## ENSMUSG00000051951.5                .                .                .
## ENSMUSG00000102851.1                .                .                .
## ENSMUSG00000103377.1                .                .                .
## ENSMUSG00000104017.1                1                .                .

We can create multiple different ‘assays’ in the same dataset that might reflect different transformations of the raw ‘counts’ data (e.g. ‘logcounts’,‘normcounts’, etc). Each of these must have the same dimensions as the ‘count’ assay.

assay(dat,'logcounts')<-log10(assay(dat,'counts')+1)

To access the cell annotations we use the colData() method:

colData(dat)
## DataFrame with 11869 rows and 2 columns
##                           barcode Size_Factor
##                       <character>   <numeric>
## AAACCCAAGCAACTCT AAACCCAAGCAACTCT    0.210892
## AAACCCACACGCGGTT AAACCCACACGCGGTT    0.999615
## AAACCCACAGCATACT AAACCCACAGCATACT    0.556600
## AAACCCACATACCATG AAACCCACATACCATG    0.642584
## AAACCCAGTCGCACAC AAACCCAGTCGCACAC    1.575145
## ...                           ...         ...
## TTTGTTGGTAGCTAAA TTTGTTGGTAGCTAAA    1.671037
## TTTGTTGGTATCCCAA TTTGTTGGTATCCCAA    0.740246
## TTTGTTGGTCCGAAAG TTTGTTGGTCCGAAAG    0.417361
## TTTGTTGGTTCAACGT TTTGTTGGTTCAACGT    1.045792
## TTTGTTGTCCGTTTCG TTTGTTGTCCGTTTCG    0.754399

To access the gene annotations we use the rowData() method:

rowData(dat)
## DataFrame with 55421 rows and 1 column
##                                   gene_id
##                               <character>
## ENSMUSG00000102693.1 ENSMUSG00000102693.1
## ENSMUSG00000064842.1 ENSMUSG00000064842.1
## ENSMUSG00000051951.5 ENSMUSG00000051951.5
## ENSMUSG00000102851.1 ENSMUSG00000102851.1
## ENSMUSG00000103377.1 ENSMUSG00000103377.1
## ...                                   ...
## ENSMUSG00000094431.1 ENSMUSG00000094431.1
## ENSMUSG00000094621.1 ENSMUSG00000094621.1
## ENSMUSG00000098647.1 ENSMUSG00000098647.1
## ENSMUSG00000096730.7 ENSMUSG00000096730.7
## ENSMUSG00000095742.1 ENSMUSG00000095742.1

There’s not much annotation in there yet for either the genes or the cells, lets see what we can add.

4.2 Add gene-level annotation from BioMaRt

Using the gene_id information in the featureData slot, we can fetch external annotations for each gene and merge them so we can get more meaningful gene information. biomaRt is an interface in Bioconductor to get information associated with various types of gene_ids.

mart <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL",
                         dataset = "mmusculus_gene_ensembl",
                         host = 'ensembl.org')
t2g <- biomaRt::getBM(attributes = c("ensembl_gene_id",
                                     "external_gene_name",
                                     "chromosome_name",
                                     "start_position",
                                     "end_position"
                                     ), mart = mart) # Fetch annotation information for all gene_ids

fData(dat)$gene_id_trimmed<-str_split_fixed(fData(dat)$gene_id,pattern="\\.",2)[,1] #Trim off the version identifier from the gene_ids
fData(dat)<-dplyr::left_join(as.data.frame(fData(dat)),t2g,by=c("gene_id_trimmed" = "ensembl_gene_id"),sort=FALSE,all.x=TRUE,keep=TRUE) # merge annotation into existing fData().
fData(dat)$gene_short_name<-fData(dat)$external_gene_name # make a field named "gene_short_name" in fData()
head(fData(dat)) #Inspect
## DataFrame with 6 rows and 8 columns
##                                   gene_id    gene_id_trimmed    ensembl_gene_id
##                               <character>        <character>        <character>
## ENSMUSG00000102693.1 ENSMUSG00000102693.1 ENSMUSG00000102693 ENSMUSG00000102693
## ENSMUSG00000064842.1 ENSMUSG00000064842.1 ENSMUSG00000064842 ENSMUSG00000064842
## ENSMUSG00000051951.5 ENSMUSG00000051951.5 ENSMUSG00000051951 ENSMUSG00000051951
## ENSMUSG00000102851.1 ENSMUSG00000102851.1 ENSMUSG00000102851 ENSMUSG00000102851
## ENSMUSG00000103377.1 ENSMUSG00000103377.1 ENSMUSG00000103377 ENSMUSG00000103377
## ENSMUSG00000104017.1 ENSMUSG00000104017.1 ENSMUSG00000104017 ENSMUSG00000104017
##                      external_gene_name chromosome_name start_position
##                             <character>     <character>      <integer>
## ENSMUSG00000102693.1      4933401J01Rik               1        3143476
## ENSMUSG00000064842.1            Gm26206               1        3172239
## ENSMUSG00000051951.5               Xkr4               1        3276124
## ENSMUSG00000102851.1            Gm18956               1        3322980
## ENSMUSG00000103377.1            Gm37180               1        3435954
## ENSMUSG00000104017.1            Gm37363               1        3445779
##                      end_position gene_short_name
##                         <integer>     <character>
## ENSMUSG00000102693.1      3144545   4933401J01Rik
## ENSMUSG00000064842.1      3172348         Gm26206
## ENSMUSG00000051951.5      3741721            Xkr4
## ENSMUSG00000102851.1      3323459         Gm18956
## ENSMUSG00000103377.1      3438772         Gm37180
## ENSMUSG00000104017.1      3448011         Gm37363

Now we have a bit more useful information associated with each gene.

4.3 Monocle3 preprocessing

The term ‘preprocessing’ comes up again even though we’re past the initial hurdle of generating the matrix. Standard secondary preprocessing for single cell RNA-Seq involves projecting expression data into the top principal components to identify ranked sources of variation. This is usually done after log-transforming the data to stabilize the variance across the dynamic range of gene expression. Monocle3 conveniently provides a function preprocess_cds() that will do this transform and PCA analysis. We start with a relatively high number of principal components to estimate, 100.

dat <- preprocess_cds(dat,
                      num_dim = 40,
                      method = "PCA",
                      norm_method = "log",
                      verbose=T,
                      cores=4)
## Remove noise by PCA ...
plot_pc_variance_explained(dat)

If we look at the variance explained for each principle component, we can see that it starts to trail off after a point. We probably won’t get much more useful information after 20 or so components. So we subset to only the first 20 components and preprocess again.

nDims <- 20 # or 20 works here too downstream
dat <- preprocess_cds(dat,
                      num_dim = nDims,
                      method = "PCA",
                      norm_method = "log",
                      verbose=T,
                      cores=4)
## Remove noise by PCA ...

4.4 QC metrics

Next we can actually start to assess some of the important quality metrices for each gene and each cell.

4.4.1 Gene-wise QC metrics

4.4.1.1 Minimum number of cells expressing a given gene

It’s a good idea, and saves time/effort to identify and only consider genes whose expression levels are detected in a certain number or proportion of cells within your assay.

dat<-detect_genes(dat)
cellCutoff<-20 # This number is arbitrary
expressed_genes <- row.names(subset(rowData(dat),
    num_cells_expressed >= cellCutoff))

length(expressed_genes)
## [1] 23614

Once we’ve detected the number of cells expressing each gene, we can look at the distribution to get a better feel

hist(rowData(dat)$num_cells_expressed,col="red",breaks=50,main="Number of cells expressing a given gene")

Most genes are not detectably expressed in more than one cell (this is the nature of single cell gene expression assays, and gene expression in general)

Lets log transform and look again. This time, we’ll add a threshold line showing our cutoff for ‘expressed genes’.

hist(log10(rowData(dat)$num_cells_expressed),col="red",breaks=50,main="log10 Number of cells expressing a given gene")
abline(v=log10(cellCutoff),lty="dashed")

We have now identified a total of 23614 that are detectably expressed in at least 20 cells in our dataset.

4.4.1.2 Distribution of gene mean copies per cell

What is the average expression level (in mRNA Copies per cell) for each gene?

rowData(dat)$mean_cpc<-Matrix::rowMeans(assay(dat))
hist(log10(rowData(dat[expressed_genes,])$mean_cpc),col="purple",breaks=50,main="Mean RNA copies per cell")

* What can this distribution tell us about the average mRNA expression/abundance of a gene? about single cell RNA-Seq efficiency in general?

  • Bonus: What is the ‘expected’ number of mRNA copies per cell for an ‘average’ gene?

4.4.2 Cell QC metrics

4.4.2.1 Distribution of detected genes across cells

How many genes are expressed in a given cell?

hist(colData(dat)$num_genes_expressed,col="darkgreen",breaks=50,main="Number of genes expressed per cell")

4.4.2.2 Mt-genome proportion

A high proportion of mitochondrial genes may indicate a lower than ideal capture efficiency for a given cell. Here we identify the subset of mitochondrial genes and look at the proportion of reads mapping to ‘mt-’ genes vs genomic genes.

mito_genes<-rowData(dat)$gene_id[grepl("^mt-",rowData(dat)$gene_short_name)]

colData(dat)$mt_reads <- Matrix::colSums(assay(dat)[mito_genes,])
colData(dat)$total_reads  <- Matrix::colSums(assay(dat))
colData(dat)$mito_ratio <- colData(dat)$mt_reads/colData(dat)$total_reads

ggplot(as.data.frame(colData(dat)),
       aes(x = num_genes_expressed, y = mito_ratio)) +
       geom_point() +
       labs(x = "Number of genes", y = "Mitochondrial ratio") +
       scale_color_brewer(palette = "Set1") +
       theme(legend.position = "none") +
       ggtitle("Number of genes vs Mitochondrial ratio") +
       monocle3:::monocle_theme_opts()

Indeed we see that the proportion of MT reads for some cells increases as the total information content decreases (ie. lower ‘quality’ cells have a greater fraction of MT reads). Often it can be advisable to set a threshold (~20%) to eliminate lower quality cells.

  • When might you not want to apply a threshold for MT-ratio?

4.4.2.3 Total mRNAs per cell

To get a general picture of the capture efficiency and depth of information for each cell we can look at the total mRNA mass recovered per cell.

colData(dat)$Total_mRNA<-Matrix::colSums(assay(dat))

hist(colData(dat)$Total_mRNA,col="darkblue",breaks=50,main="Total mRNAs sequenced per cell")

  • What shape is this distribution?

  • What cellular features might be correlated with total mRNA abundance?

Bonus question: How many mRNAs do we expect are in a given eukaryotic cell?

For each of the above QC ‘criterion’ we can define thresholds that can be used to filter cells/genes to improve the quality of the dataset. Here is usually where obvious doublet cells (more than one cell is associated with a single barcode sequence) or low-quality cells are removed prior to doing any further statistical interpretations.

5 Reduce dimensionality to visualize the cell relationships

Now we’re ready to visualize the cells. To do so, you can use either t-SNE, which is very popular in single-cell RNA-seq, or UMAP, which is increasingly common. Monocle 3 uses UMAP by default, since it is both faster and better suited for clustering and trajectory analysis in RNA-seq. To reduce the dimensionality of the data down into the X, Y plane so we can plot it easily, we call reduce_dimension():

dat <- reduce_dimension(dat,
                        verbose=TRUE,
                        reduction_method="UMAP",
                        cores = 4)
## No preprocess_method specified, using preprocess_method = 'PCA'
## Note: reduce_dimension will produce slightly different output each time you run it unless you set 'umap.fast_sgd = FALSE' and 'cores = 1'
## Running Uniform Manifold Approximation and Projection
## 11:48:59 UMAP embedding parameters a = 1.577 b = 0.8951
## 11:48:59 Read 11869 rows and found 20 numeric columns
## 11:49:01 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 11:49:03 Writing NN index file to temp file /tmp/Rtmp5G7417/filea0c2d7e66f945
## 11:49:03 Searching Annoy index using 4 threads, search_k = 1500
## 11:49:04 Annoy recall = 100%
## 11:49:06 Commencing smooth kNN distance calibration using 4 threads
## 11:49:09 Initializing from normalized Laplacian + noise
## 11:49:09 Commencing optimization for 200 epochs, with 235650 positive edges
## 11:49:16 Optimization finished

To visualize the dimensionality reduction we use plot_cells():

plot_cells(dat)
## No trajectory to plot. Has learn_graph() been called yet?
## cluster not found in colData(cds), cells will not be colored
## cluster_cells() has not been called yet, can't color cells by cluster

And we can look at how different features of the cells are distributed across this embedding. For now we only have technical features to view.

plot_cells(dat,color_cells_by="num_genes_expressed",cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
## Cells aren't colored in a way that allows them to be grouped.

plot_cells(dat,color_cells_by="Total_mRNA",cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
## Cells aren't colored in a way that allows them to be grouped.

plot_cells(dat,color_cells_by="mito_ratio",cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
## Cells aren't colored in a way that allows them to be grouped.

What can we deduce/hypothesize about the embedding (we will formally test later)? Number of cell types? Diversity of cell types?

5.0.1 Embedding shapes

  • Sometimes, clearly defined cell types are obvious puncta in a 2D embedding
  • Other times, what you think of as a single cell type may be broken up into several ‘subtypes’
  • Still more, a single cell ‘type’ may consist of several ‘cell states’ that might present as more of an amorphous shape in an embedding
  • Cells in an ergodic transitioning state may be represented as a ‘pseeudotemporal trajectory’ as different cells pass through different phases of the transition. - These trajectories can be very useful as a high-resolution timecourse for how cells respond to changes or cues.
  • What types of shape:stories might be represented in these clusters? What cell types/states do we expect to find in an E18.5 developing mouse cortex?

We can also map the expression level of individual genes (markers) onto this embedding to start to parse out meaning.

plot_cells(dat,genes=c("Sox9","Gad1","Slc17a6","Slc17a7"),cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

plot_cells(dat,genes=c("Pax6","Eomes","Fezf2","Tle4","Satb2","Pou3f2"),cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

6 Cluster similar cells into ‘cell types’

We next want to impose a clustering solution onto this embedding to group cells with similar transcriptional profiles together. We use cluster_cells() to perform this function in monocle3:

dat <- cluster_cells(dat,
                     verbose=TRUE,
                     resolution=5e-5)
## Running leiden clustering algorithm ...
## Run kNN based graph clustering starts:
##   -Input data of 11869 rows and 2 columns
##   -k is set to 20
##   Finding nearest neighbors...
##     DONE. Run time: 0.0830000000000268s
##   Compute jaccard coefficient between nearest-neighbor sets ...
##     DONE. Run time: 0.00599999999997181s
##   Build undirected graph from the weighted links ...
##     DONE. Run time: 0.132000000000005s
##   Run leiden clustering ...
## leiden_find_partition: partition_type       CPMVertexPartition
## leiden_find_partition: seed                 1497170604
## leiden_find_partition: resolution_parameter 5e-05
## leiden_find_partition: num_iter             2
## leiden_find_partition: initial_membership   0
## leiden_find_partition: edge_weights         0
## leiden_find_partition: node_sizes           0
## leiden_find_partition: number vertices      11869
## leiden_find_partition: number edges         237380
##     Current resolution is 5e-05; Modularity is 0.813080987389074; Quality is 471883.6682; Significance is 265521.560607798; Number of clusters is 12
##     Done. Run time: 0.256078481674194s
##   Clustering statistics
##    resolution_parameter  quality modularity significance cluster_count selected
##                   5e-05 471883.7   0.813081     265521.6            12        *
## 
##   Cell counts by cluster
##    cluster cell_count cell_fraction
##          1       3451         0.291
##          2       2682         0.226
##          3       1724         0.145
##          4       1544         0.130
##          5        911         0.077
##          6        587         0.049
##          7        328         0.028
##          8        292         0.025
##          9        117         0.010
##         10         93         0.008
##         11         84         0.007
##         12         56         0.005
## 
##   Maximal modularity is 0.813080987389074 for resolution parameter 5e-05
## 
##   Run kNN based graph clustering DONE.
##   -Number of clusters: 12
plot_cells(dat, color_cells_by="cluster", cell_size=0.75, group_label_size = 5, show_trajectory_graph=FALSE)

Remember, clustering is a useful tool but is also arbitrary for continuous cell-cell transitions.

7 De Novo cell type annotation

As a crude first-pass annotation, we can leverage publicly available, bulk RNA-Seq gene expression information for specific cell types to try and ‘learn’ cell type annotations for each cell. This ‘transfer learning’ approach can provide a resonable starting point for coarse cell type identification. The SingleR package leverages reference transcriptomic datasets of pure cell types to infer the cell of origin of each of the single cells independently. First we need to fetch a reference dataset of bulk RNA-Seq expression profiles.

mouse.rnaseq <- celldex::MouseRNAseqData(ensembl = TRUE)
## snapshotDate(): 2020-10-27
## see ?celldex and browseVignettes('celldex') for documentation
## loading from cache
## see ?celldex and browseVignettes('celldex') for documentation
## loading from cache
## snapshotDate(): 2020-10-27
## loading from cache
## require("ensembldb")
## Warning: Unable to map 2180 of 21214 requested IDs.

And then, we can use this dataset to compare our single cell expression profiles against to apply coarse cell type labels.

dat.exprs<-assay(dat[expressed_genes,])
rownames(dat.exprs) <- str_remove(rownames(dat.exprs), "\\.\\d+")

system.time(annots<-SingleR(dat.exprs,
                ref = mouse.rnaseq, labels = colData(mouse.rnaseq)$label.fine,
                de.method = "classic", method = "single", BPPARAM = BiocParallel::MulticoreParam(4))
)
## Warning: 'method="cluster"' is no longer necessary when 'cluster=' is specified
##    user  system elapsed 
## 108.414  17.689  59.263

7.1 Apply learned annotations to cell_data_set object

Once we’ve learned labels for each cell, we can then add this information into the phenotypeData (pData()) slot of our main object.

colData(dat)$cell_type <- annots$pruned.labels
plot_cells(dat,color_cells_by="cell_type", group_cells_by = "cluster", cell_size=0.75, label_cell_groups = FALSE)
## No trajectory to plot. Has learn_graph() been called yet?

table(pData(dat)$cell_type)
## 
##                 aNSCs            Astrocytes               B cells 
##                   189                     7                     1 
##     Endothelial cells             Ependymal          Erythrocytes 
##                   108                     2                   140 
##           Fibroblasts Fibroblasts activated Fibroblasts senescent 
##                    19                    24                     5 
##          Granulocytes           Macrophages             Microglia 
##                     2                     4                    37 
##             Monocytes               Neurons     Neurons activated 
##                     2                  2176                     2 
##                  NPCs      Oligodendrocytes                  OPCs 
##                  8356                     2                   334 
##                 qNSCs 
##                    20

The algorithm does a reasonable job at identifying major cell types, and identifies a number of different cell types in our dataset. Before we go any further, lets filter the dataset down to only cell types that we might be interested in for downstream analysis.

inds <- annots$pruned.labels %in% c("NPCs", "Neurons", "Neurons activated", "OPCs", "Oligodendrocytes",
                                    "qNSCs", "aNSCs", "Astrocytes", "Ependymal", "Microglia")
# Only keep these cell types
cells_use <- row.names(annots)[inds]
dat <- dat[,cells_use]
table(colData(dat)$cell_type)
## 
##             aNSCs        Astrocytes         Ependymal         Microglia 
##               189                 7                 2                37 
##           Neurons Neurons activated              NPCs  Oligodendrocytes 
##              2176                 2              8356                 2 
##              OPCs             qNSCs 
##               334                20
# Preprocess again on filtered dataset
nDims <- 20

dat <- preprocess_cds(dat,
                      num_dim = nDims,
                      method = "PCA",
                      verbose=TRUE)
## Remove noise by PCA ...
dat <- reduce_dimension(dat,
                        verbose=TRUE,
                        reduction_method="UMAP",
                        umap.n_neighbors=20,
                        cores = 4)
## No preprocess_method specified, using preprocess_method = 'PCA'
## Note: reduce_dimension will produce slightly different output each time you run it unless you set 'umap.fast_sgd = FALSE' and 'cores = 1'
## Running Uniform Manifold Approximation and Projection
## 11:52:07 UMAP embedding parameters a = 1.577 b = 0.8951
## 11:52:07 Read 11125 rows and found 20 numeric columns
## 11:52:07 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 11:52:09 Writing NN index file to temp file /tmp/Rtmp5G7417/filea0c2d656a1232
## 11:52:09 Searching Annoy index using 4 threads, search_k = 2000
## 11:52:10 Annoy recall = 100%
## 11:52:12 Commencing smooth kNN distance calibration using 4 threads
## 11:52:17 Initializing from normalized Laplacian + noise
## 11:52:17 Commencing optimization for 200 epochs, with 294852 positive edges
## 11:52:25 Optimization finished
dat <- cluster_cells(dat,
                     verbose=TRUE,
                     resolution=1e-5)
## Running leiden clustering algorithm ...
## Run kNN based graph clustering starts:
##   -Input data of 11125 rows and 2 columns
##   -k is set to 20
##   Finding nearest neighbors...
##     DONE. Run time: 0.0949999999999704s
##   Compute jaccard coefficient between nearest-neighbor sets ...
##     DONE. Run time: 0.00499999999999545s
##   Build undirected graph from the weighted links ...
##     DONE. Run time: 0.0780000000000314s
##   Run leiden clustering ...
## leiden_find_partition: partition_type       CPMVertexPartition
## leiden_find_partition: seed                 705020546
## leiden_find_partition: resolution_parameter 1e-05
## leiden_find_partition: num_iter             2
## leiden_find_partition: initial_membership   0
## leiden_find_partition: edge_weights         0
## leiden_find_partition: node_sizes           0
## leiden_find_partition: number vertices      11125
## leiden_find_partition: number edges         222500
##     Current resolution is 1e-05; Modularity is 0.528813572336826; Quality is 444327.18634; Significance is 113267.313282539; Number of clusters is 6
##     Done. Run time: 0.22643780708313s
## 
##   Clustering statistics
##    resolution_parameter  quality modularity significance cluster_count selected
##                   1e-05 444327.2  0.5288136     113267.3             6        *
## 
##   Cell counts by cluster
##    cluster cell_count cell_fraction
##          1       7251         0.652
##          2       1672         0.150
##          3       1654         0.149
##          4        415         0.037
##          5        106         0.010
##          6         27         0.002
## 
##   Maximal modularity is 0.528813572336826 for resolution parameter 1e-05
## 
##   Run kNN based graph clustering DONE.
##   -Number of clusters: 6
plot_cells(dat,color_cells_by="cell_type", group_cells_by = "cluster", cell_size=0.75, label_cell_groups = FALSE) + scale_color_brewer(palette="Set1")
## No trajectory to plot. Has learn_graph() been called yet?
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors

plot_cells(dat,color_cells_by="cluster",cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?

plot_cells(dat,color_cells_by="partition",cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?

plot_cells(dat,genes=c("Pax6","Eomes","Fezf2","Tle4","Satb2","Pou3f2"),cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

8 Finding marker genes

One of our objectives is to find marker genes expressed by each cluster/cell type. Once cells have been clustered, we can ask what genes makes them different from one another. To do that, start by calling the top_markers() function:

system.time(
  marker_test_res <- top_markers(dat, group_cells_by="cluster",
                               reference_cells=500, cores=4)
  )
top_specific_markers <- marker_test_res %>%
                           dplyr::filter(fraction_expressing >= 0.30) %>%
                            group_by(cell_group) %>%
                            top_n(5, pseudo_R2)

top_specific_marker_ids <- unique(top_specific_markers %>% pull(gene_id))

plot_genes_by_group(dat,
                    top_specific_marker_ids,
                    group_cells_by="cluster",
                    ordering_type="maximal_on_diag",
                    max.size=4)
## Warning in if (axis_order == "marker_group") {: the condition has length > 1 and
## only the first element will be used

plot_cells(dat,color_cells_by="cluster",cell_size=0.75, group_label_size = 8)
## No trajectory to plot. Has learn_graph() been called yet?

9 Pseudotime analysis

Pseudotime is a measure of how much progress an individual cell has made through a process such as cell differentiation. Since we have at least one population of cells that is transitioning from a progenitor state to a series of mature neurons, can we identify potential differentiation trajectories?

9.1 Learn trajectory graph

First we must learn a ‘trajectory graph’ across each partition (contiguous group of cells) in the data.

dat <- learn_graph(dat)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## Warning in min(data_df$weight[data_df$weight > 0]): no non-missing arguments to
## min; returning Inf
plot_cells(dat,
           color_cells_by = "cell_type",
           label_groups_by_cluster=FALSE,
           label_leaves=TRUE,
           label_branch_points=TRUE,
           label_principal_points=TRUE,
           label_cell_groups = FALSE,
           graph_label_size=3,
           cell_size=0.75) + scale_color_brewer(palette="Set1")
## Warning: `select_()` was deprecated in dplyr 0.7.0.
## Please use `select()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors

dat <- order_cells(dat,
                   root_pr_nodes="Y_1")

plot_cells(dat,
           color_cells_by = "pseudotime",
           label_cell_groups=FALSE,
           label_leaves=FALSE,
           label_branch_points=FALSE,
           graph_label_size=3,
           cell_size=0.75)

Here we are also setting the ‘root’ of this trajectory to begin at a node in the NPC cells.

9.2 Differential gene expression with respect to pseudotime

How do we find the genes that are differentially expressed on the different paths through the trajectory? How do we find the ones that are restricted to the beginning of the trajectory? Or excluded from it? Monocle3 uses a ‘principal graph test’ to test whether cells at similar positions on the trajectory have correlated expression (This can take some time so we don’t necessarily have to run it in real time).

Here are some high-scoring differentially-expressed genes along the pseudotime trajectory in their UMAP embedding

pseudotime_genes <- c("Pax6","Btg2","Snap25","Fezf2","Hes6","Cux1","Eomes","Sox9")
#pseudotime_genes <- head(pseudotime_pr_test_res[order(pseudotime_pr_test_res$q_value),])$external_gene_name
plot_cells(dat, genes=pseudotime_genes,
           show_trajectory_graph=TRUE,
           label_cell_groups=FALSE,
           label_leaves=FALSE,
           cell_size=0.5)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

We can also explicitly look at how the expression of these genes along the pseudotime trajectory. The function plot_genes_in_pseudotime() takes a small set of genes and shows you their dynamics as a function of pseudotime:

pseudotime_lineage_cds <- dat[fData(dat)$gene_short_name %in% pseudotime_genes,
                       pData(dat)$cell_type %in% c("NPCs","Neurons")]

plot_genes_in_pseudotime(pseudotime_lineage_cds,
                         color_cells_by="cell_type",
                         min_expr=0.5,
                         ncol=2)

9.2.1 Bonus: 3D trajectories

dat_3d <- reduce_dimension(dat, max_components = 3, cores=4)
## No preprocess_method specified, using preprocess_method = 'PCA'
## Note: reduce_dimension will produce slightly different output each time you run it unless you set 'umap.fast_sgd = FALSE' and 'cores = 1'
dat_3d <- cluster_cells(dat_3d,resolution = 1e-5)
dat_3d <- learn_graph(dat_3d)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#dat_3d <- order_cells(dat_3d)

plot_cells_3d(dat_3d, color_cells_by="cell_type")
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

10 Pattern Discovery (NMF)

Finally, we can step beyond marker gene analysis, and use some ‘latent space’ discovery methods to learn patterns of co-regulated gene expression.

These patterns can be used to identify/define: * Cell type identities * Biological processes * Spatial gradients * Other cellular features

… often with significantly greater resolution and precision than marker genes.

Importantly, these patterns are ‘data driven’ and often yield insights into the heterogeneity and complexity of a given dataset that were unanticipated.

seed<-19790119
nPatterns<-30
# Takes about ~3-4 minutes...
system.time(dat.nnmf<-RcppML::nmf(assay(dat,'logcounts')[expressed_genes,], 
                                  k = nPatterns,
                                  tol = 1e-04,
                                  maxit = 100,
                                  seed = seed,
                                  verbose = TRUE,
                                  diag = TRUE,
                                  nonneg = TRUE
                                  )
            )
## 
## iter |      tol 
## ---------------
##    1 | 8.88e-01
##    2 | 9.68e-02
##    3 | 2.23e-02
##    4 | 9.17e-03
##    5 | 5.14e-03
##    6 | 3.26e-03
##    7 | 2.28e-03
##    8 | 1.69e-03
##    9 | 1.27e-03
##   10 | 9.69e-04
##   11 | 7.79e-04
##   12 | 6.58e-04
##   13 | 5.71e-04
##   14 | 5.04e-04
##   15 | 4.47e-04
##   16 | 3.85e-04
##   17 | 3.19e-04
##   18 | 2.57e-04
##   19 | 2.08e-04
##   20 | 1.69e-04
##   21 | 1.40e-04
##   22 | 1.19e-04
##   23 | 1.04e-04
##   24 | 9.26e-05
##    user  system elapsed 
## 201.075  10.857 214.648
# Gene x Pattern matrix
dim(dat.nnmf$w)
## [1] 23614    30
# Pattern x Cell matrix
dim(dat.nnmf$h)
## [1]    30 11125
#Add patterns to phenotype data for visualization
patterns.df<-data.frame(t(dat.nnmf$h))
colnames(patterns.df)<-paste0("Pattern_",c(1:nPatterns))

colData(dat)<-cbind(colData(dat),patterns.df)

pdf("figures/patterns.pdf",width=5,height=5)
lapply(c(1:nPatterns),function(i){
plot_cells(dat, color_cells_by=paste0("Pattern_",i), cell_size=0.75) +
    ggtitle(paste0("Pattern_",i))  +
    coord_equal(1)
})
## [[1]]
## 
## [[2]]
## 
## [[3]]
## 
## [[4]]
## 
## [[5]]
## 
## [[6]]
## 
## [[7]]
## 
## [[8]]
## 
## [[9]]
## 
## [[10]]
## 
## [[11]]
## 
## [[12]]
## 
## [[13]]
## 
## [[14]]
## 
## [[15]]
## 
## [[16]]
## 
## [[17]]
## 
## [[18]]
## 
## [[19]]
## 
## [[20]]
## 
## [[21]]
## 
## [[22]]
## 
## [[23]]
## 
## [[24]]
## 
## [[25]]
## 
## [[26]]
## 
## [[27]]
## 
## [[28]]
## 
## [[29]]
## 
## [[30]]
dev.off()
## png 
##   2
targetPattern<-7
plot_cells(dat, color_cells_by=paste0("Pattern_",targetPattern), cell_size=0.75) +
    ggtitle(paste0("Pattern ",targetPattern))  +
    coord_equal(1)

10.1 Genes associated with learned patterns

geneWeights.df<-data.frame(dat.nnmf$w)
colnames(geneWeights.df)<-paste0("Pattern_",c(1:nPatterns))

#fData(dat)[head(rownames(dat.nnmf$W)[order(dat.nnmf$W[,patternOfInterest],decreasing=TRUE)]),]
tmp<-as.data.frame(cbind(rowData(dat)[expressed_genes,c("gene_id","gene_short_name")],geneWeights.df))

targetPatterns<-c(5,9,10,23,targetPattern)
DT::datatable(tmp[,c("gene_id","gene_short_name",
                     unlist(lapply(targetPatterns,function(i){paste0("Pattern_",i)
                       }))
                       )])
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html

11 Session Information

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ensembldb_2.14.1            AnnotationFilter_1.14.0    
##  [3] GenomicFeatures_1.42.3      AnnotationDbi_1.52.0       
##  [5] SingleR_1.4.1               celldex_1.0.0              
##  [7] forcats_0.5.1               stringr_1.4.0              
##  [9] dplyr_1.0.8                 purrr_0.3.4                
## [11] readr_2.1.2                 tidyr_1.2.0                
## [13] tibble_3.1.6                ggplot2_3.3.5              
## [15] tidyverse_1.3.1             biomaRt_2.46.3             
## [17] RcppML_0.3.7                monocle3_1.0.0             
## [19] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
## [21] GenomicRanges_1.42.0        GenomeInfoDb_1.26.7        
## [23] IRanges_2.24.1              S4Vectors_0.28.1           
## [25] MatrixGenerics_1.2.1        matrixStats_0.61.0         
## [27] Biobase_2.50.0              BiocGenerics_0.36.1        
## [29] Matrix_1.4-0                BiocManager_1.30.16        
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1                  backports_1.4.1              
##   [3] AnnotationHub_2.22.1          BiocFileCache_1.14.0         
##   [5] plyr_1.8.6                    igraph_1.2.11                
##   [7] lazyeval_0.2.2                splines_4.0.2                
##   [9] crosstalk_1.2.0               BiocParallel_1.24.1          
##  [11] digest_0.6.29                 htmltools_0.5.2              
##  [13] viridis_0.6.2                 fansi_1.0.2                  
##  [15] magrittr_2.0.2                memoise_2.0.1                
##  [17] tzdb_0.2.0                    Biostrings_2.58.0            
##  [19] modelr_0.1.8                  vroom_1.5.7                  
##  [21] askpass_1.1                   prettyunits_1.1.1            
##  [23] colorspace_2.0-2              blob_1.2.2                   
##  [25] rvest_1.0.2                   rappdirs_0.3.3               
##  [27] ggrepel_0.9.1                 haven_2.4.3                  
##  [29] xfun_0.29                     crayon_1.5.0                 
##  [31] RCurl_1.98-1.5                jsonlite_1.7.3               
##  [33] glue_1.6.1                    gtable_0.3.0                 
##  [35] zlibbioc_1.36.0               XVector_0.30.0               
##  [37] DelayedArray_0.16.3           BiocSingular_1.6.0           
##  [39] scales_1.1.1                  DBI_1.1.2                    
##  [41] Rcpp_1.0.8                    viridisLite_0.4.0            
##  [43] xtable_1.8-4                  progress_1.2.2               
##  [45] proxy_0.4-26                  bit_4.0.4                    
##  [47] rsvd_1.0.5                    DT_0.20                      
##  [49] htmlwidgets_1.5.4             httr_1.4.2                   
##  [51] speedglm_0.3-3                RColorBrewer_1.1-2           
##  [53] ellipsis_0.3.2                pkgconfig_2.0.3              
##  [55] XML_3.99-0.8                  farver_2.1.0                 
##  [57] sass_0.4.0                    uwot_0.1.11                  
##  [59] dbplyr_2.1.1                  utf8_1.2.2                   
##  [61] tidyselect_1.1.1              labeling_0.4.2               
##  [63] rlang_1.0.1                   reshape2_1.4.4               
##  [65] later_1.3.0                   pbmcapply_1.5.0              
##  [67] munsell_0.5.0                 BiocVersion_3.12.0           
##  [69] cellranger_1.1.0              tools_4.0.2                  
##  [71] cachem_1.0.6                  cli_3.2.0                    
##  [73] generics_0.1.2                RSQLite_2.2.9                
##  [75] ExperimentHub_1.16.1          broom_0.7.12                 
##  [77] evaluate_0.14                 fastmap_1.1.0                
##  [79] yaml_2.3.4                    RhpcBLASctl_0.21-247.1       
##  [81] knitr_1.37                    bit64_4.0.5                  
##  [83] fs_1.5.2                      RANN_2.6.1                   
##  [85] sparseMatrixStats_1.2.1       mime_0.12                    
##  [87] grr_0.9.5                     xml2_1.3.3                   
##  [89] compiler_4.0.2                rstudioapi_0.13              
##  [91] plotly_4.10.0                 curl_4.3.2                   
##  [93] interactiveDisplayBase_1.28.0 reprex_2.0.1                 
##  [95] bslib_0.3.1                   stringi_1.7.6                
##  [97] highr_0.9                     RSpectra_0.16-0              
##  [99] lattice_0.20-41               ProtGenerics_1.22.0          
## [101] vctrs_0.3.8                   pillar_1.7.0                 
## [103] lifecycle_1.0.1               jquerylib_0.1.4              
## [105] RcppAnnoy_0.0.19              BiocNeighbors_1.8.2          
## [107] data.table_1.14.2             bitops_1.0-7                 
## [109] irlba_2.3.5                   Matrix.utils_0.9.8           
## [111] httpuv_1.6.5                  rtracklayer_1.50.0           
## [113] R6_2.5.1                      promises_1.2.0.1             
## [115] gridExtra_2.3                 codetools_0.2-16             
## [117] MASS_7.3-53                   assertthat_0.2.1             
## [119] leidenbase_0.1.4              openssl_1.4.6                
## [121] withr_2.4.3                   GenomicAlignments_1.26.0     
## [123] Rsamtools_2.6.0               GenomeInfoDbData_1.2.4       
## [125] hms_1.1.1                     grid_4.0.2                   
## [127] beachmat_2.6.4                rmarkdown_2.11               
## [129] DelayedMatrixStats_1.12.3     shiny_1.7.1                  
## [131] lubridate_1.8.0

11.1 Knit Time

With caches

knitEnd <- Sys.time()
knitEnd - knitStart
## Time difference of 17.09725 mins